Adaptive Testing
Adaptive Test is a set of general principles applied to specific IC manufacturing test methods with the explicit purpose of optimizing the value of test. These principles include the practice of using IC device or system manufacturing test data to change a device or system test binning, future test sequencing or the manufacturing material flow so as to decrease the cost of test, improve performance distributions and/or improve outgoing quality and reliability. The impact of Adaptive testing is a product value optimization for increased test effectiveness, runtime efficiency and creation of test flows for reconfigurable parts. To effectively use Adaptive Test, new work is needed in test cell design and circuit design, data systems, and the coordination between IC manufacturers, IC test and assembly. The top challenges facing industry application of Adaptive Testing is listed in Section H. This whitepaper provides: 1) A description of Adaptive Test and terminology used by its practitioners; 2) Example applications of Adaptive Test as of 2015 and future opportunities and; 3) A list of challenges for the development and deployment of Adaptive Test. To ensure the industry can fully exploit the benefits of Adaptive Test, this whitepaper describes: 1) The infrastructure requirements for test cells, data systems and device and system designs and; 2) A description of Adaptive Test challenges and the coordination needed between IC manufacturers, OSATs (Outsource Assembly and Test providers) and fabless system integrators. 1.1 Adaptive Test The intent of Adaptive Test is to increase the quality and reliability of products (and potentially yield) and/or to improve operation efficiency of testing. Also, Adaptive Testing can help to identify abnormal parts as early in the manufacturing sequence as possible (preferably at wafer test) and to add tests or change test conditions to screen the riskier material. Additional Adaptive Test methods may selectively skip test content on specific material to save costs. In doing so, Adaptive Test must successfully account for the risks of passing bad parts. 1.1.1 Adaptive Test Definition Adaptive test comprises a set of methods for automatically changing manufacturing test conditions, manufacturing flow, test content, test limits, or test outcome to reduce test cost, increase outgoing quality and reliability, reconfigure parts, or collect data to further improve test and manufacturing. Adaptive test makes these changes in a manner that does not significantly increase testing time or increase human involvement in test operations. The decisions on when and how to adapt the test are made algorithmically by the tester, other test cell equipment, or an automatic data analysis system in a shorter time than the traditional test improvement cycle involving engineering analysis. 1.1.2 Adaptive Test Description Adaptive Test is generally accepted as an advanced test strategy that can be used to achieve quality, yield, and cost goals that might not be reached by normal test methods. Adaptive Test may modify a production test process in any of five ways: 1) '' 'Test' Conditions' (modifying voltage or clock frequency such as VDD)' 2) '''Manufacturing Flows' (adding or deleting test insertions such as burn-in) 3) '' 'Test' Content' (adding or deleting specific patterns or tests such as transition fault or IDDQ, respectively) ''4) '' 'Test' Limits ' (changing the pass/fail limits such as DC power or Vdd-min test specifications) 5) '''Test Outcomes '(changing the binning of some die based on post-test analysis of the die’s test results) Figure TST 7 - Adaptive Test supports feed-forward and feed-back data flows. Adaptive Test provides for data use for test decisions either in-situ (i.e. during a given test step) or post-test. Adaptive Test applications are organized by when decisions are made to modify the test flow and to which device(s) the modified test flow are applied. The four most common categories are in-situ, feed-forward, feed-back and post-test. Figure 1 is a flow diagram depicting the relationships among these four categories. 1) In-situ: Data collected from the part being tested is used to modify the testing of the same device during the same test insertion. Speed grading is an example of the In-situ category where the data from the device is used to change the conditions of the test plan for the same device. Another example common to analog components would be trim and device calibration.'' '' 2) Feed-forward: Data collected from a previous test step stage (e.g. probe, hot probe, burn-in) is used to change how the same parts are tested at a future stage. An example of the Feed-forward category are statistical methods which identify ‘risky’ dice or wafers and selects these components (only) for burn-in or “clean” dice that may be candidates for reduced testing.'' '' 3) Feed-back: ''Data collected from a previous part (or parts) is used to'' modify the tests or limits of different devices yet to be tested. Skipping some test patterns on high yield wafers, adding more tests to low yield wafers or refining statistical models used for die classification are examples of this category. 4) Post-Test: Data sample statistics or other analysis is performed between test steps and is used to reclassify certain devices or to change future manufacturing flow and test conditions for these devices. Part Average Testing and outlier identification methods are examples of the Post-Test category. 1.1.3 Example Applications Below is a list of example Adaptive Test applications. Each example is labeled by one or two categories outlined earlier. In addition to clarifying the categories, the examples demonstrate the shift from manual and static methods to automatic methods with little or no human intervention during test execution. Note that the use of electronic die/chip ID (e.g., a die-specific identifier such as wafer/XY coordinate and lot information that is fused on each die) is a key enabler for many of these applications. There is a list of references that include many example applications at the end of this section. 1) Dynamic test flow changes (In-situ, Feed-forward): Die production data is monitored within the test program to add or remove tests, to selectively perform per die characterization for yield learning, or to collect data for later diagnosis. This application supports many common real-time Statistical Process Control methods. 2) Statistical screening (Post-Test, Feed-forward): After wafer or lot data collection, identify die which are outliers or mavericks as possible sources of test escapes spikes or reliability failures. Statistical screening is Feed-forward because results can be used to route target dies through test flows different from the main flow. · PAT – Part Average Testing is a statistical technique relating the test result of a device under test to the test result of the remaining dice on the wafer. · NNR' – Nearest Neighbor Residuals is a statistical technique relating a'' univariate or multivariate test result to a model derived from a local region of the device under test. 3) '''Single-step flow control (Feed-forward): Data from one test step is used to optimize testing at the next test step to focus subsequent screening on issues observed in the manufactured parts. · For example, inline test modifies wafer test; wafer test modifies package test; burn-in modifies final test; or package test modifies card/system-test. 4) Off-tester optimization of test flows (Feed-back): Off-tester data analysis drives test flow changes for future devices (fully automated). · For example, off-line analysis could optimize test flows, test content and test measurement routines using input from many sources including historical data, test capacity, required turn-around times, DPM (defects per million) requirements, expected yields and parametric data. 5) Production monitors and alerts (In-situ, Feed-forward, Feed-back):' ' Data from multiple sources is merged for statistical analysis to control production test optimization beyond what has historically been possible. · For example, subtle parametric shifts from marginal wafer probe contacting can be automatically identified and action taken during production testing. 6) Die matching (Feed-forward, Post-test):' Production data from various sources is used to' support the build/test process for multi-chip applications and many of today’s board build process to match specific die combinations during assembly. · Note die-matching data transfer may require world-wide data sharing, across multiple companies and throughout the entire supply chain. 7) On-chip test structures and sensors (In-Situ, Feed-forward, Feed-back): Data collected from auxiliary on-chip test structures such as ring oscillators, selected critical paths, on-chip voltage and thermal sensors, or on-chip reliability monitors is used to modify the die’s test content, test limits or future test flow. · Sensor measurements can be used at all levels of assembly (including system operation) to monitor and adjust functionality. 8) On-chip configuration (In-situ, Feed-forward):' Production test data (including test structure' data) is used to adjustment features in the design to improve die’s performance, power, margin, yield or reliability. · Emerging ICs have more on-chip configuration & adaptability such as clock tuning, partial goods (redundant spare cores), and voltage and frequency adjustments (including per core). 9) Card/System configuration and test (Feed-forward, Post-test):' Component test results (such as' parametric data, yield characteristics or partial good data) are used to customize the card/system test flow or customize card/system test conditions. · Architectures such as IEEE 11491-2013 standardize the infrastructure that can be used to enable Adaptive Testing applications at the board and system level. · The feed-forward application enables specific fabrication process and component test parameters (e.g. Vdd-min or wafer x,y location) to be fed-forward and used by the board test program to make decisions on whether to add specific content to test for marginality. · The feedback and post-test applications enables the creation of a pareto of the board-level failures per Electronic Chip ID of a specified component type is sent to the supplier, enabling analysis to correlate failure types to fab and test parameters. If so, then the supplier can adjust their tests or bins. · In-situ test where the reading of an on-chip sensor for voltage or temperature enables more or less stressful board conditions to be applied to check for margin and performance. On-chip sensors can also be read during field usage to monitor aging and data can be sent back to the suppliers to adjust their test limits. 10) Adaptive Diagnostics (In-situ, Feed-forward):' Test results' drive advanced diagnostic data collection. · For example, on-chip BIST (built-in self-test) circuitry can be programmed on-the-fly to localize and characterize specific types of the failures. But these methods must only be selectively applied to ensure reasonable test time/cost. · Many emerging chips have programmable on-chip test controllers that can interpret test results on-chip – and take action (test, diagnostics, characterization) without requiring extensive data collection being transmitted to/from the test equipment. 1.1.4 Adaptive Test Architecture / Flows Figure 2 displays a model of the entire End-to-End flow of parts under test and Adaptive Test applications. Note that there are feed-forward, in-situ, feed-back and post-test dispositioning opportunities at each test step. Although Figure 2 shows a simple view of the database, the actual database structure would probably consistent of 2-3 databases levels each with unique capacity and latency capabilities. Figure TST 8 - The architecture of Adaptive Test organizes each insertion’s test data into one or more databases'.' A waterfall of manufactured parts may insert, join or query databases for test flow decision-making. 1.1.5 Levels of Adaptation & Data Model Making the decisions to adapt any of the test attributes listed above first involves collecting the right data (which test programs already do well) and then organizing the data into a structured data model so that the right data can be accessed when and where it is needed. At the appropriate time, data of the proper scope, that is data from a particular test run or data from a particular part, wafer, or lot, is accessed from the data model and processed by the applicable decision algorithms. Similarly, the test variables, such as limits, conditions, flow, content, must be changed at the right time to complete the adaptation decision. The data model can exist entirely in an off-line database apart from the tester, or be distributed between servers and the tester depending on the latency requirements and convenience. To branch a test flow for a particular part (a real-time decision) latency must be short, i.e. there can be no significant impact to test time. To support low latency requirements, the data needs to either be stored on the tester or be rapidly pulled into the tester. To make an outlier decision such as to re-bin some already tested parts, longer latencies are tolerated such as from the time of test until the time material is shipped. Longer latencies mean an off-line database can be used. Decisions to adapt a test are often based on comparing the variation observed on the sample of parts in question to a model of the expected variation. In outlier detection, parametric test limits are adapted to track expected variation so as to only discard parts with unexpected variation. Tests can be temporarily dropped from a test flow when their control charts show the manufactured material and test process is in control and within specification. If and when monitoring based on sample testing shows the material or test process has changed and gone out of control, the tests are reinstated on every part. Similarly, diagnostic tests can be added to a test flow when certain types of failures occur more frequently than expected. Generally, more adaptability means more frequent decision-making in the test flow with the goal of improving the trade-off between defect level from “freeing the guilty” and overkill from “convicting the innocent” or balancing test false negative error rates and test false positive error rates, respectively. Adaptability follows a bottom-up progression from the conventional static limit, to a static parameter variance model (static PAT), to a variance model with variable parameters (dynamic PAT), to choosing variance model equations based upon well-grounded principles. Moving up this progression requires not only more data but also a better understanding of the processes that cause the test response to vary. This progression also means the decision-making generally moves from an off-line human activity to an on-line machine activity. 1.2 Adaptive Test Infrastructure (data exchange, databases, etc.) Adaptive Test makes decisions throughout the manufacturing process based upon data from multiple sources and using data of varying detail and completeness. Before actionable decisions can be made with multiply sourced data, new data integration requirements are needed. Some integration requirements are unique to Adaptive Test and are different from data requirements used at any of the originating sources. Figure 2 highlights Adaptive Test data requirements reach across all test insertions. Changes or new Adaptive Test data processing requirements are anticipated throughout the Silicon manufacturing and assembly. As of today, example databases are coming on-line that merge and centralize Adaptive Test infrastructure. Data requirements that are different but not unique to Adaptive Test include date and time stamping, test naming, and data recording methods. For example, Adaptive Test data stamps should be consistent across all insertions and between companies. Current date stamping practices are ad-hoc with some companies using different date formats and date references at different test insertions. Database standards exist for date stamping such as Coordinated Universal Time. A time and date stamp requirement policy eases integrating test data when some units are retested and simplifies merging two (or more) data sets in an unambiguous time order. Similar issues arise in recording floating point results of the same test from different insertions with different precisions and formats. Data requirements unique to Adaptive Test center on the database polices of latency, access and retention period. Latency measures the time between the request for a data item and the availability of the requested item. Access refers to the scope of the user community that can store, retrieve, and act on a data item. Retention period measures the time the data item is electronically available. * Local processing in the test cell requires low latency.'' For example access latency'' should be in a few milliseconds, if data is to be retrieved on a per device level (Real-Time Analysis & Optimization (RT A/O). Post-Test Analysis & Dispositioning (PTAD) applications may have latency requirements of a few seconds to a few minutes. Normally data volumes for these steps would be relatively small. * Processing in a central database (e.g., “The Cloud”) has more relaxed timing constraints (minutes, hours), but typically deal with much larger data volumes * Since Adaptive Test decisions affect the quality of shipped goods, data retention requirements depend on specific market requirement (which may exceed 10 years in some cases). Many areas of the IC manufacturing are increasingly more comfortable with using data from the cloud, but a notable exception is the test cell. Test cell integration of Adaptive Test algorithms is one of the most challenging applications. For example, local test cell actions (such as “''clean probe-card now”'') were the sole responsibility of the specific test floor and were designed to guarantee the test cell integrity and test cell-to-test cell correlation. Adaptive Test changes this paradigm in a number of ways: * Algorithms will be owned by multiple involved parties, including wafer fab, design house and test floor. Some algorithms may originate from commercial providers, others from the involved parties themselves. They all need be executed smoothly next to each other in a real-time environment. * Data collection as well as data access (e.g., to upstream data in case of data feed-forward) becomes a mission-critical task of a test operation as well as the entire supply chain. This challenges the reliability of the underlying infrastructure, which likely spans multiple companies, geographic areas and cultures. * Likely, one wants to simulate the impact of algorithms on historical databases to understand how to maximize the value of Adaptive Test without creating adverse side effects. This requires the exact same algorithm to be executed in as diverse environments as a cloud database and a test cell measuring real-time data. As a consequence, industry needs to develop * APIs to allow algorithms to plug into a diverse set of environments. * Data exchange formats which are flexible, compact and standardized so that only minimal extraction and translation effort is required. A common set of indices is required, such that data remains identifiable and traceable even across heterogeneous supply chains. * Recipe management systems which can handle a diverse set of recipe origins, check for (likely un-intended) interactions and maintain consistency across non-synchronized update cycles from the various origins. Version control systems for these recipes are also required. * Execution systems must be enabled to monitor the health of Adaptive Test algorithms (are basic assumptions met?) and escalate errors to the right entities in an actionable format. 1.3 Implications for ATE and the Test Cell The test cell is expected to deliver a cost-effective means to screen defects for quality, classify devices for performance and collect data for learning. The rate of product complexity is increasing with more clock domains, voltage planes, IOs and configurable fuses followed by the introduction of parallel testing of multiple, dissimilar devices with 2.5D and 3D stack packaging. In parallel, the business demand for higher quality and reduced product cost severely challenges the ability of the test cell to continue to provide an effective test solution and still continue to reduce the cost of test. Adaptive Test methods provide levers to address these additional demands but not without disruptions of their own. Adaptive test requires the test cell to be able to accept input from external and internal sources, apply device-specific models to determine test conditions and to evaluate results for flow control and device binning. This materially changes the setup, execution and categorization requirements of the test cell and affects both low-level software capability such as firmware as well as high-level software such as the executable test program. Of particular challenge is the relationship of flow control and binning when the test flow becomes a function of non-deterministic evaluations influenced by dynamic test points, limits and device configuration. Future test cells must support the following: * Per-device test flows based on external inputs, the device itself, and dynamic business rules * Zero-cost logging of data for use in the Adaptive Test process in a database-tlike format * A move from being the entire cell controller to a test engine with a standard API * Asynchronous and distributed (multi-insertion) test flows '' * ''The ability to perform structural (model-based) testing over functional 1.4 Test results driving “Adaptive Designs” More and more designs are being reconfigured during testing. Examples include partial goods (on-chip redundancy), VDD/frequency adjustment and local clock tuning. In most cases this product personalization will be based on either test measurements or data feed-forward from other operations. In some cases, this reconfiguration will be based on “application demand”. 1.4.1 Testing resilient features Resilient features are on-chip structures that can be used to configure the product to work around hard defects or to tolerate latent defects. These structures span a wide range of circuits and architectures, including fuses, redundant memory elements, and architectures capable of operating on reduced numbers of logic elements like CPU cores or graphics components, error-detection and -retry schemes for hard and soft errors, and the sensing and control circuitry for adaptive designs. Like every other circuit, these structures must be themselves tested and characterized, though these circuits present unique testing challenges beyond standard logic and memory elements, including temporary configurations (for fuses), soft-repair vs. hard-repair validation (for memories), combinatorial explosion of down-cored variants of redundant features, the need for error injection to test recovery circuits, and analog stimulus for sensors (such as voltage or aging monitors). 1.4.2 Non-deterministic device behavior: test and run-time availability Non-determinism is incompatible with traditional cycle-accurate automated test equipment, but is nonetheless becoming typical on modern SOCs. Several new I/O protocols are non-deterministic, as are the standard methods to avoid metastability in clock-domain crossings (which are commonplace in highly integrated devices). Fine-grained power gating and power-state management can change the configuration of a device and its execution profile during test and normal operation. Adaptive designs take this notion even further with architectural features which can perform state rollback and pipeline retry based on events at arbitrary times during execution. The result is that test patterns, particularly functional patterns which execute in mission mode, must either prevent or be tolerant of non-deterministic response. The former raises coverage questions; the latter pattern and ATE interface challenges. 1.4.3 Testing Adaptive Designs Adaptive designs bring the complexity of dealing with advanced power management such as power gating, variable configuration of IP (such as IO and arrays), self-defining performance bucketing and part-specific reconfiguration (such as redundancy, repair and harvesting) to a test environment traditionally characterized by a linear test flow measuring to fixed corners to verify device operability. Instead, on-chip sensors are used to detect the workload, voltage, temperature, and timing margin of the chip as it operates and dynamically adjusts power supplies, clock frequencies, thermal control, and even the instruction stream. The adaptive features of a design make it much harder to define (and thus characterize) both typical and worst-case use models, which in turn makes it more difficult to test appropriately. Additionally, the removal of excess margin represented by traditional guard-banding increases the risk of exposure to subtle defects, necessitating both higher coverage and better correlation between structural and functional test modes. An emerging direction is to apply Adaptive Test techniques (which modify the parameters or content of a test program based on information collected about the device under test from the current or previous test insertions) to adaptive designs (which modify their own operating point or execution flows based on internally generated feedback). The proclivity of an adaptive design to compensate for the environment in which it is being (functionally) tested will present challenges for data gathering by the Adaptive Test process beyond opening control loops to test at fixed conditions. A means to record and store the conditions to which the device is tested, organized in a manner for ease of retrieval and consumption is required. 1.5 Adaptive Manufacturing An emerging direction is using test results to drive other IC production steps such as packaging multi-chip products. For example, the card/board assembly operation may require that specific dies or types of dies to be used on specific boards based on previous test results. Given the emergence of multi-chip packages (such as 3DICs) and power constraints, specific bare dies will need to be selected for assembly based on parametric data collected at test such as IDD or power/performance measurements. Key challenges of End-to-End data feed-forward for assembly operations include: · Cross-company data management · Robust data availability · Data security · Full traceability · Data format standardization 1.6 Adaptive Test for Card/System/Field Adaptive Test methodologies described in this document for IC-level testing can be equally extended and applied to board and system testing and even field usage. While ICs have traditionally been tested standalone in an almost ‘noise-free’ ATE environment and/or tested with limited structural tests to see whether they perform to their specifications, the board/system environment can be quite different in terms of noise, timing margin, voltage and functional test trigger conditions that structural tests were unable to produce. Improved board yield and IC DPM can be improved significantly where adaptive test that includes the board/system level performance is able drive enhanced screening both at the IC suppliers test and/or board/system manufacturing test. The four types of Adaptive Test described in Section 2 (In-situ, Feed-forward, Feed-back and Post-test) can all be extended to include the board and system manufacturing. One of difficulties to extend the chip-level adaptive test to board/system or even in-field test is to track their test trigger conditions and be able to convert between them. For example, chip-level scan-based logic gate test may not be always applicable for board/system/in-field tests due to the difficulties or impossibilities to control the scan chain data, clock pulse, non-stoppable in-field online function executions, etc. Similarly, a functional execution, which can be treated as a functional test may be hard to convert to a chip-level ATE test because the function execution could involve memory contents, their transactions, logic and I/O data flow, etc. Therefore, tracking the test/failure conditions and the capability to convert between them is the key for adaptive test extension to board/system level. Extending Adaptive Test applications to the board and system level requires extensive data infrastructure, analysis, exchange and security. Companies providing ICs, board design and test need to openly collaborate on a technical and business level to be successful. '' '' 1.7 Adaptive Test Challenges and Directions This section highlights the key challenges that the industry must address to fully exploit Adaptive Testing across the supply chain. The color scheme of the table below is: · White --Manufacturing solutions exist · Yellow -- Some solutions may be known but not widely accepted or mature. · Red -- Manufacturable solutions are not known -- or all solutions are not industry standard. (i.e., known solutions are based on proprietary solutions) '' '' Figure TST10 – Adaptive Test Challenges '' '' 1.8 Summary Adaptive Testing has the opportunity to improve product quality & reliability, reduce cost and improve product yield beyond today’s capabilities. Almost all companies are starting to use some forms of Adaptive Testing, but there is not a sequential roadmap for implementation and many applications are created in an ad-hoc way. There are a number of challenges that are today limiting the industry’s ability to fully exploit Adaptive Testing across the supply chain. (These are highlighted in the table in the previous section.) The industry must collectively address these challenges in the next few years.